Spatial environmental factors predict cardiovascular and all-cause mortality: Results of the SPACE study

Background Environmental exposures account for a growing proportion of global mortality. Large cohort studies are needed to characterize the independent impact of environmental exposures on mortality in low-income settings. Methods We collected data on individual and environmental risk factors for a multiethnic cohort of 50,045 individuals in a low-income region in Iran. Environmental risk factors included: ambient fine particular matter air pollution; household fuel use and ventilation; proximity to traffic; distance to percutaneous coronary intervention (PCI) center; socioeconomic environment; population density; local land use; and nighttime light exposure. We developed a spatial survival model to estimate the independent associations between these environmental exposures and all-cause and cardiovascular mortality. Findings Several environmental factors demonstrated associations with mortality after adjusting for individual risk factors. Ambient fine particulate matter air pollution predicted all-cause mortality (per μg/m3, HR 1.20, 95% CI 1.07, 1.36) and cardiovascular mortality (HR 1.17, 95% CI 0.98, 1.39). Biomass fuel use without chimney predicted all-cause mortality (reference = gas, HR 1.23, 95% CI 0.99, 1.53) and cardiovascular mortality (HR 1.36, 95% CI 0.99, 1.87). Kerosene fuel use without chimney predicted all-cause mortality (reference = gas, HR 1.09, 95% CI 0.97, 1.23) and cardiovascular mortality (HR 1.19, 95% CI 1.01, 1.41). Distance to PCI center predicted all-cause mortality (per 10km, HR 1.01, 95% CI 1.004, 1.022) and cardiovascular mortality (HR 1.02, 95% CI 1.004, 1.031). Additionally, proximity to traffic predicted all-cause mortality (HR 1.13, 95% CI 1.01, 1.27). In a separate validation cohort, the multivariable model effectively predicted both all-cause mortality (AUC 0.76) and cardiovascular mortality (AUC 0.81). Population attributable fractions demonstrated a high mortality burden attributable to environmental exposures. Interpretation Several environmental factors predicted cardiovascular and all-cause mortality, independent of each other and of individual risk factors. Mortality attributable to environmental factors represents a critical opportunity for targeted policies and programs.

A growing list of environmental factors present particular risks to cardiovascular health [1][2][3][4]. Ambient fine particulate matter air pollution (PM 2.5 ) from traffic, industry, fires, and dust is a risk factor for all-cause mortality, cardiovascular mortality, ischemic heart disease (IHD), and stroke [7,8]. In 2019, ambient PM 2.5 ranked seventh among all health risk factors for mortality, responsible for 4.14 million deaths, of which 2.47 million were from CVD [5,6]. Similarly, household air pollution from inefficient stoves and solid fuels was responsible for 2.31 million deaths (1.07 million from CVD) [5,6]. Proximity to traffic pollution and noise is associated with increased rates of adverse CVD events, particularly IHD and stroke [4,7,9]. Distance to health care services affects access to preventive care, tertiary care, and emergent percutaneous coronary interventions (PCI) [10,11]. Socioeconomic environment is an independent risk factor for CVD, even after controlling for individual socioeconomic status [12]. Population density has been shown to be both positively and negatively correlated with CVD in different environments [13,14]. Exposures to artificial light at night cause circadian dysregulation and have been associated with ischemic heart disease outcomes [4,15,16]. Finally, land use (e.g., greenspaces, mixed land use) may predict CVD through association with physical activity, social engagement, and access to health services [17,18]. Together, these variables act directly and indirectly to precipitate cardiovascular disease and mortality (Fig 1).
Understanding the relationships between environmental risk factors and health is a critical step towards designing targeted policies and programs to reduce the immense burden of attributable disease. To date, most investigations of environmental factors have studied single risk factors in high-income settings. We therefore developed a spatial environmental model to provide a better understanding of the independent associations between multiple spatial environmental factors and mortality, within a low-income population in the middle-income country of Iran.

Study setting
We analyzed data of participants enrolled in the Golestan Cohort Study (GCS) in Golestan Province, Iran, a middle-income country with diverse ethnicities and lifestyles. In both Iran and Golestan, CVD is the leading cause of death and disability [19].

Study participants
The GCS enrolled 50,045 individuals (28,811 females and 21,234 males) across northeastern Golestan from 2004 to 2008 [20]. Participants ranged in age from 40 to 75 years in order to capture individuals with higher rates of non-communicable disease, particularly esophageal cancer. Approximately 80% were enrolled from 326 rural villages ranging in size from 20 to 150 residents. The remaining 20% were selected randomly from Gonbad City, the second-largest city in Golestan with a population of approximately 130,000. Exclusion criteria were: unwillingness to participate for any reason; being a temporary resident; or having a previous diagnosis of upper gastrointestinal cancer. Among those selected to enroll, participation rates were approximately 80% for women and 65% for men. Participants were followed-up actively every 12 months with a follow-up success rate was 99%. Study methods were approved by ethics review committees of the Tehran University of Medical Sciences, the International Agency for Research on Cancer, and the National Cancer Institute. All participants signed a written informed consent at enrollment. The full study protocol is publicly available [20].

Individual characteristics
Individual baseline characteristics were collected at time of enrollment in the GCS [20]. Participants were interviewed by a physician in their native language and completed a detailed lifestyle questionnaire and physical exam. The following covariates were included in our analysis: age, sex, ethnicity, marital status, education, socioeconomic status, waist and hip circumference, physical activity, medical history (IHD, stroke, diabetes, hypertension), and substance use (tobacco, alcohol, and opium). Details on the socioeconomic status score can be found in S1 File.
At baseline, individuals enrolled in the GCS were, on average, 52.1 years of age (SD 8.9 years) and 58% female. The majority was married (88%) and illiterate (70%). A minority had a history of IHD or stroke (6%), diabetes (7%), hypertension (20%), tobacco use (22%), opium use (17%), or alcohol use (3%). Three-quarters were of Turkmen ethnicity and the remainder were of other ethinc groups (e.g., Persians, Baluchis, Qizilbash), consistent with the prevalence of ethnic groups in the sampled region. The derivation and validation cohorts were similar in the distribution of risk factors (Table 1).

Mortality data
In the GCS, nearly 100% of deaths are captured via documents collected from participants, hospital files, and verbal autopsy questionnaires, which are reviewed by at least two independent internists to define diagnoses according to the International Classification of Diseases, 10 th Revision (ICD-10) codes [21,22]. All-cause mortality included all reported deaths. Cardiovascular mortality included deaths attributable to IHD (ICD-10 codes I20-25), cerebrovascular disease (I60-69), cardiac arrest (I46), congestive heart failure (I50), hypertensive diseases (I10-15), chronic rheumatic heart diseases (I05-09), pulmonary heart disease (I26-28), and other cardiovascular system diseases not otherwise specified. Complete case analysis was used for missing data.

Individual geocodes
Each individual was assigned a geocode (latitude and longitude) based on residence location. Details on geocode assignment can be found in S1 File.

Spatial environmental factors
We developed exposure variables for eight prespecified spatial environmental factors (SEFs) using a combination of GCS data and publicly available datasets [20,[23][24][25][26][27][28]. These variables were chosen based on a review of the literature on environmental risks for cardiovascular disease [1][2][3][4], as well as data availability. The SEFs were: ambient air pollution [24,25], household fuel use and ventilation [20], socioeconomic environment [23], proximity to traffic (within 100m of a minor highway or within 500m of a major highway) [20], distance to percutaneous coronary intervention centers [20], population density [26], nighttime light exposure [27], and land use [28]. Environmental exposures were assigned according to year of enrollment. Some potential environmental hazards were not included because no data was available for the study region (e.g., noise pollution; toxins in water and food). All SEFs were included in the final multivariable model. Data sources and related methodologies are summarized in S1 Table in S1 File. Please see Data Sharing Statement for information on how to access GCS-specific data.

Derivation and validation cohorts
The GCS dataset was randomized into derivation and validation cohorts, stratified to ensure mortality was balanced in both groups, following established methodology [29]. The derivation cohort (90%) was used to construct the multivariable spatial environmental model; the validation cohort (10%) was used subsequently to test the model's predictive value. The derivation cohort contained 45,042 individuals and 5996 deaths, of which 2733 were cardiovascular deaths. The validation cohort contained 5003 individuals and 655 deaths, of which 286 were cardiovascular deaths. A summary of the study design is shown in S1 Fig in S1 File.

Statistical analyses
We developed spatial environmental survival models to study the association of each SEF on the hazard of mortality, for both all-cause and cardiovascular mortality. All eight SEFs were tested simultaneously in the multivariable survival model, in order to adjust for each other. Additionally, we adjusted for common individual risk factors, including sex, age, individual socioeconomic status, anthropometric measures, and history of cardiovascular disease, hypertension, diabetes, and smoking. We used a spatial random effects survival model [30][31][32] (e.g., shared frailty model) to model time to mortality. For an individual j in geocode i with censoring time t given covariates x ij, , the survival function in the frailty model is: Here, β is a vector of regression coefficients and νi is a frailty parameter (or the random effect) for geocode i and is assumed to have a gaussian distribution. Environmental exposures demonstrated low correlation at the geocode level reducing the risk of collinearity and model overfitting. Computations were performed in R (packages 'survival' and 'spBayesSurv') [33].
We report the exponentiated coefficients as the estimate of hazard ratio along with 95% confidence interval of the estimates.
To account for spatial dependence, as a sensitivity analysis we also fitted the Bayesian survival models that adjust for spatial autocorrelation based on distance between geocodes [34,35]. Gaussian random field priors were specified on the frailty parameters to allow for autocorrelation between neighboring geocodes. We validated the model using recent methodologies for external validation [36]. First, we used a chi-squared test to determine whether the addition of SEFs improved the model's goodness-of-fit beyond traditional risk factors in the derivation cohort. Next, time-dependent areaunder-the-curve analysis was used to test how well the spatial frailty model predicted all-cause and cardiovascular mortality in the novel validation cohort.
Finally, we calculated the population attributation fraction (PAF) for all categorical predictors in the multivariable model. PAFs incorporate both the hazard and prevalence of a risk factor to assess the fraction of total disease risk in the population that would be eliminated (or added) if the risk factor were eliminated from the population (i.e., if all individuals with that risk factor were moved to the reference category) [37].

Patient and public involvement
Local residents, physicians, elders, religious leaders, and university physicians were deeply involved in the design and implementation of the Golestan Cohort Study. Details can be found in S1 File.

Distribution of spatial environmental factors
Across all geocodes, average annual fine particulate matter air pollution exposures were 33.5 μg/m 3 for the 5 years prior to enrollment (167.7 μg/m 3 cumulatively, SD 17.5 μg/m 3 ). Most households burned kerosene fuel (71%), of which 42% had a chimney for ventilation. A total of 7% of households used biomass fuels (typically wood or dung burned indoors for cooking or heating), of which 81% had a chimney. The remainder of households providing responses used either gas (12%) or mixed fuels (9%). One third of participants (34%) lived close to major highways. Distances to the nearest percutaneous coronary intervention center averaged 92.2 km (SD 37.4 km). The socioeconomic status score varied with location, with lower scores concentrated in the northeast. Local population density averaged 1732 persons per square kilometer (SD 3069). The intensity of nighttime light averaged 22.7 (SD 22.2) on a NOAA intensity metric. Both population density and nighttime light exposure were highest in Golestan's central agricultural valley. Most households were located amidst cropland (57%) or urban settings (25%), with the remainder located among shrubland (9%), grassland (9%), or barren earth (<1%). The derivation and validation cohorts were similar in the distribution of all SEFs ( Table 2). The spatial distribution of the SEFs is illustrated in Fig 2.

Mortality data
In the derivation cohort (n = 45,042), there were 2733 cardiovascular deaths and 5996 allcause deaths. In the validation cohort (n = 5003), there were 286 cardiovascular deaths and 655 all-cause deaths. The mean follow-up time in the derivation and validation cohorts was 10.2 years and 10.9 years, respectively. Mean time to all-cause death in the derivation and validation cohorts was 6.2 years and 6.1 years, respectively. Mean time to cardiovascular death in both cohorts was 5.8 years. In both derivation and validation cohorts, all-cause mortality was 13% and cardiovascular mortality was 6% over the follow-up period.

Individual characteristics predict mortality
Individual characteristics predicted all-cause and cardiovascular mortality in the derivation cohort ( Table 3). The following variables demonstrated increased hazard for both all-cause and cardiovascular mortality: older age, male gender, being unmarried, lower socioeconomic status, illiteracy, higher waist circumference (adjusted for hip circumference), lower hip circumference (adjusted for waist circumference), lower physical inactivity, tobacco use, opium use, and history of hypertension, diabetes, IHD, and stroke. Turkmen ethnicity was associated with increased hazard for all-cause mortality, but not for cardiovascular mortality.

Spatial environmental factors predict mortality
We identified several SEFs that predicted mortality in the multivariable derivation model ( Exposure to household air pollution also demonstrated predictive power. Specifically, the use of household kerosene fuel use without a chimney compared to gas predicted cardiovascular mortality (HR 1.19, 95% CI 1.01 to 1.41) and all-cause mortality (HR 1.09, 95% CI 0.97 to 1.23). Additionally, the use of biomass fuel without a chimney compared to gas predicted both Greater distance to PCI centers also was associated with increased hazard of both cardiovascular mortality (per 10 km, HR 1.02, 95% CI 1.004 to 1.03) and all-cause mortality (per 10 km, HR 1.01, 95% CI 1.004 to 1.02). Proximity to traffic also increased the hazard of cardiovascular mortality (HR 1.13, 95% CI 1.01 to 1.27).
The remaining SEFs-neighborhood socioeconomic status, local population density, nighttime light, and land use-did not demonstrate relationships with either all-cause or cardiovascular mortality.

Spatial environmental factors add predictive power
The addition of the eight SEFs improved the predictive power of the model beyond traditional risk factors. Goodness of fit statistics (chi-squared statistics and corresponding p-values) were Spatial models for six spatial environmental factors (SEFs) across Golestan Province, Iran. These models were used to assign environmental exposures to individuals based on their location of residence. Methodologies are described in S1 Table in S1 File. A. Land use based on satellite imagery [28]. B. Proportion of households burning biomass fuels without a chimney. Estimates derived from Gaussian process regression (kriging) of average fuel use patterns for each village geocode [20]. C. Proximity to traffic, defined as within 100m of a minor highway or within 500m of a major highway [20]. D. Proximity to percutaneous coronary intervention centers [20]. E. Average socioeconomic status index score. Estimates derived from kriging of median socioeconomic score for each geocode [23]. F. Average annual intensity of light-at-night using satellite imagery [27]. Models developed in ArcGIS [38]. All displayed data is either in the public domain, collected by the investigators, or for illustrative purposes only.

Stratification by sex
A sex-stratified analysis found no effect modification of the relationship between SEFs and either all-cause or cardiovascular mortality.

Adjusting for spatial autocorrelation
After adjustments for spatial autocorrelation, there was no meaningful change in the magnitude or direction of the point estimates for the eight SEFs. The spatial autocorrelation model was consistent with the original frailty estimates.

Model validation
The model incorporating SEFs was tested on a novel validation cohort. In this validation cohort, the spatial frailty model effectively predicted mortality with time-dependent areasunder-the-curve of 0.76 for all-cause mortality and 0.81 for cardiovascular mortality. Hazard ratios for cardiovascular and all-cause mortality associated with individual risk factors in the multivariable spatial environmental model. SES = socioeconomic status. IHD = ischemic heart disease. CVA = cerebrovascular accident (stroke).

Statement of principal findings
In this analysis of the Golestan Cohort Study, several SEFs predicted cardiovascular or allcause mortality in the spatial survival model after adjusting for individual risk factors. When applied to a novel validation cohort, the model effectively predicted both cardiovascular and all-cause mortality. Ambient air pollution. In our model, ambient air pollution levels were associated with all-cause and cardiovascular mortality after adjusting for other environmental risk factors in the multivariable model. Additionally, PAFs identified a large burden of mortality attributable to ambient air pollution. This is consistent with the existing literature demonstrating associations between ambient PM 2.5 and both cardiovascular and all-cause mortality [3][4][5][6][7]. Household air pollution. We found that household kerosene fuel use without a chimney was associated with both cardiovascular mortality and all-cause mortality. This is consistent with a previous study of the GCS that found associations between cumulative kerosene exposure and both cardiovascular and all-cause mortality [39]. Additionally, we observed modest evidence of an association between the use of biomass fuel without a chimney and both cardiovascular and all-cause mortality. This is consistent with the existing literature on the cardiovascular effects of indoor burning of solid biomass [3,5,6]. Additionally, PAF calculations illustrate a high burden of cardiovascular and all-cause mortality attributable to biomass and kerosene fuel burning.
Proximity to traffic. In our model, residence close to highways increased the hazard of cardiovascular mortality. Although this hazard ratio was only 1.13, many individuals live near major roadways, resulting in a PAF of 0.11 for cardiovascular mortality. Importantly, this association was observed after adjusting for ambient air pollution and socioeconomic environment in the multivariable model. This suggests that the calculated hazard may be due chiefly to traffic-related noise or to another unmeasured variable associated with proximity to traffic. These findings are consistent with prior studies that have demonstrated associations between proximity to traffic and cardiovascular events, particulary IHD and stroke [7,9].
Distance to health care services. We found that greater distance to PCI centers increased the hazard of both cardiovascular and all-cause mortality. These results align with a previous cohort study demonstrating that distance to hospital is an independent predictor of mortality among patients with incident MI in the community [11].
Several other SEFs did not demonstrate clear relationships in the multivariable model: neighborhood socioeconomic status, local population density, nighttime light, and land use. Low socioeconomic status neighborhood environment previously was shown to predict cardiovascular events [12,40], and did so in our univariate model for cardiovascular mortality, but not in the multivariable environmental model. This suggests that socioeconomic environment may be a proxy for other environmental risk factors (e.g., air pollution) that were included in our model.

Strengths and limitations
A key strength of our model was the simultaneous testing of multiple environmental risk factors. To date, most studies of environmental risk look at a single environmental risk factor against a background of individual characteristics. This can lead to confounding by other unmeasured environmental factors. In this study, we incorporate a diversity of spatially resolved SEFs in a prospective model predicting cardiovascular mortality. Additionally, the study was performed in a rural, low-income setting, helping to bridge a gap in the medical literature.
This study also benefitted from the use of the GCS dataset, which included a large sample size and excellent follow-up rate, as well as systematic methods to identify mortality and attributable causes of death. Additionally, our spatial random effects survival model controlled for possible spatial dependency in the data.
Our study had several limitations. First, exposures were assigned according to village-or neighborhood-level geocodes, rather than specific home addresses (due to human subjectsrelated privacy considerations). This could result in exposure misclassification and bias our findings towards the null. However, given the small sizes of villages, we estimate the average distance between assigned geocodes and true home address to be less than 500 meters. With the exception of proximity to traffic, none of our modeled environmental exposures vary dramatically over this distance.
Second, the spatial environmental factors in our model were assessed at the year of enrollment. This may misclassify exposure for participants exposed to SEFs prior to enrollment, for participants migrating to new locations with different exposures, or if the exposure varied over time. Similarly, given that environmental exposures were assigned at enrollment, our model does not account for acute exposures that may result in acute cardiovascular events (e.g. acute air pollution exposure triggering coronary plaque rupture). Again, we would expect these factors to bias our findings towards the null.
Third, household fuel use and ventilation were used as proxies for air pollution exposure. Although not optimal, this method has been used previously to study air pollution exposures in Golestan [39], as well as in studies on the association between household air pollution and childhood mortality [41].
Fourth, the socioeconomic environment variable used wealth indices from study participants and could be inaccurate if the mean wealth index at each geocode is not representative of the community.
Finally, although we have adjusted for eight SEFs in the environmental model, there remains the possibility of residual confounding from unmeasured SEFs (e.g. climate; temperature variation; noise pollution; toxins in water and food) and individual risk factors (e.g. dyslipidemia).

Conclusions
We tested the prospective associations between eight environmental factors and cardiovascular and all-cause mortality in a large cohort in a low-income setting. Our findings demonstrate that the burden of disease attributable to the environment may be as large as traditional cardiovascular risk factors, and thus represents a critical opportunity for targeted policies and programs. Furthermore, these findings illustrate the utility and feasibility of incorporating environmental data in survival models, even in low-income settings.
A growing literature illustrates how health care providers, governments, and charities can identify and intervene on environmental exposures at the individual and population levels [8,42,43]. We anticipate that these findings and analytic approach will stimulate further studies to promote better health for populations and the environment worldwide.